Automatic identification of head movements in video-recorded conversations: can words help?
نویسندگان
چکیده
Head movements are the most frequent gestures in face-to-face communication, and important for feedback giving (Allwood, 1988; Yngve, 1970; Duncan, 1972), and turn management (McClave, 2000).Their automatic recognition has been addressed by many multimodal communication researchers (Heylen et al., 2007; Paggio and Navarretta, 2011; Morency et al., 2007). The method for automatic head movement annotation described in this paper is implemented as a plugin to the freely available multimodal annotation tool ANVIL (Kipp, 2004), using OpenCV (Bradski and Koehler, 2008), combined with a command line script that performs a number of file transformations and invokes the LibSVM software (Chang and Lin, 2011) to train and test a support vector classifier. Successively, the script produces a new annotation in ANVIL containing the learned head movements. The present method builds on (Jongejan, 2012) by adding jerk to the movement features and by applying machine learning. In this paper we also conduct a statistical analysis of the distribution of words in the annotated data to understand if word features could be used to improve the learning model. Research aimed at the automatic recognition of head movements, especially nods and shakes, has addressed the issue in essentially two different ways. Thus a number of studies use data in which the face, or a part of it, has been tracked via various devices and typically train HMM models on such data (Kapoor and Picard, 2001; Tan and Rong, 2003; Wei et al., 2013). The accuracy reported i these studies is in the range 75-89%. Other studies, on the contrary, try to identify head movements from raw video material using computer video techniques (Zhao et al., 2012; Morency et al., 2005). Different results are obtained depending on a number of factors such as video quality, lighting conditions, whether the movements are naturally occurring or rehearsed. The best results so far are probably those in (Morency et al., 2007), where an LDCRF model achieves an accuracy from 65% to 75% for a false positive rate of 20-30% and outperforms earlier SVM and HMM models. Our work belongs to the latter strand of research in that we also work with raw video data.
منابع مشابه
Multimodal feedback expressions in Danish and Polish spontaneous conversations
This paper presents a pilot comparative study of feedback head movements and vocal expressions in Danish and Polish naturally occurring video recorded conversations. The conversations involve three well-acquainted participants who talk freely in their homes while eating and drinking. The analysis of the data indicates that the participants use the same types of head and spoken feedback in the t...
متن کاملClassification of Feedback Expressions in Multimodal Data
This paper addresses the issue of how linguistic feedback expressions, prosody and head gestures, i.e. head movements and face expressions, relate to one another in a collection of eight video-recorded Danish map-task dialogues. The study shows that in these data, prosodic features and head gestures significantly improve automatic classification of dialogue act labels for linguistic expressions...
متن کاملRecognizing Visual Signatures of Spontaneous Head Gestures
Head movements are an integral part of human nonverbal communication. As such, the ability to detect various types of head gestures from video is important for robotic systems that need to interact with people or for assistive technologies that may need to detect conversational gestures to aid communication. To this end, we propose a novel Multi-Scale Deep Convolution-LSTM architecture, capable...
متن کاملRUNNING HEAD : Mapping Facial Expression Mapping and Manipulating Facial Expression
opinions, findings, conclusions, or recommendations expressed herein are those of the authors and do not reflect the views of NSF or Carnegie Mellon University. ABSTRACT Non-verbal visual cues accompany speech to supplement the meaning of spoken words, signify emotional state, indicate position in discourse, and provide back-channel feedback. This visual information includes head movements, fac...
متن کاملChallenges Ahead Head movements and other social acts in conversations
When involved in face-to-face conversations, people move their heads in typical ways. The pattern of head gestures and their function in conversation has been studied in various disciplines. Many factors are involved in determining the exact patterns that occur in conversation. These can be explained by considering some of the basic properties of face-to-face interactions. The fact that convers...
متن کامل